BERRY-ESSEEN BOUNDS IN THE ENTROPIC 
CENTRAL LIMIT THEOREM 



S. G. BOBKOV^''', G. P. CHISTYAKOV^-'', AND F. GOTZE'*''* 

Abstract. Berry-Esseen-type bounds for total variation and relative entropy distances to 
the normal law are established for the sums of non-i.i.d. random variables. 



1. Introduction 

Let Xi, . . . , Xn be independent (not necessarily identically distributed) random variables 
with mean EX^ = and finite variances af, = EX| (ufc > 0). Put = Y^^=i^'k- Under 
additional moment assumptions, the normalized sum 

_ Xi + ■ ■ ■ + Xn 

Or) , 

has aproximately a standard normal distribution in a weak sense. Moreover, the closeness 
of the distribution function Fn{x) = P{Sn < xy/B^} to the standard normal distribution 
function 

has been studied intensively in terms of the so-called Lyapunov ratios 

E"- TP I |s 

r,s/2 
-Dn 

In particular, if all have finite third absolute moments, the classical Berry-Esseen theorem 
says that 

sup|F„(x)-$(x)| <CL3, (1.1) 

X 

where C is an absolute constant (cf. e.g. [E], [F], [Pe]). 

One of the most remarkable features of (1.1) is that the number of summands does not 
explicitly appear in it, while in the i.i.d. case, that is, when Xj. have equal distributions, L3 
is of order which is best possible for the Kolmogorov distance under the 3-rd moment 
condition. 

In this paper we shall prove bounds for stronger distances between Fn and such as total 
variation — $||tv and relative entropy D{Fn\\^). However, these distances are clearly 
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useless for example when all summands have discrete distributions. Therefore, some further 
assumptions are needed. 

When estimating the error of normal approximation by means of these distances, it seems 
natural to require that every has an absolutely continuous distribution. Even with this 
assumption we cannot exclude the case that our distances of Sn to the normal law may be 
growing when the distributions of Xk get near to discrete distributions. Thus we shall assume 
that the densities of X^ are bounded on a reasonably large part of the real line. This can be 
guaranteed quite naturally, for instance, by using the entropy functional, defined for a random 
variable X with density p{x) by 

/ + 00 
p{x) \ogp{x) dx. 
-oo 

Once X has a finite second moment, the entropy is well-defined as a Lebesgue integral, although 
the value h{X) = — oo is possible. Introduce a related functional 

/■+°° v(x) 
D(X) = hiZ) - hiX) = / p(x) log / , dx, 

where Z is a normal random variable with density ipa,a having the same mean a and variance 
cr^ as X. Note that this functional is affine invariant, that is, D{co + ciX) = D{X), for all 
Co € R, ci 7^ 0, and in this sense it docs not depend neither on the mean or the variance of X. 

The quantity D{X), denoted also D{Fx\\Fz), where Fx and Fz are the corresponding 
distributions of X and Z, is known as the "entropic distance to normality or Gaussianity" . 
It may be characterized as the shortest KuUback-Leibler distance from Fx to the class of all 
normal laws on the real line. In general, < D{X) < +oo, and the equality D{X) = is 
possible, when X is normal, only. Moreover, by Pinsker's inequality, the entropic distance 
dominates the total variation in the sense that 

D{X)>^\\Fx-Fzf^y. 

Thus, the size of D{X) provides a strong distance of Fx to normality, while finiteness of 
D{X) guarantees that Fx is separated from the class of discrete probability distributions. 
Using D for both purposes, one may obtain refinements of Berry-Esseen's inequality (1.1) in 
terms of the total variation and the entropic distances to normality for the distributions 



Theorem 1.1. Let D be a non-negative real number. Assume that Xk have finite third 
absolute moments, and D^X^) < D {1 < k < n). Then 

\\Fn - $||tv < CL^, (1.2) 
where the constant C depends on D, only. 



In particular, if all X^ are equidistributed with EX^ = 1, we get 



|i^n-$||TV < -^E|Xi|3 (1.3) 



with a constant C depending on D{Xi), only. Although (1.2)-(1.3) seem to be new, related 
estimates in the i.i.d.-case were studied by many authors. For example, in the early 1960's 
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Mamatov and Sirazhdinov [M-S] found an exact asymptotic — $||tv = where 

the constant c is proportional to jEXj^j, and which holds under the assumption that the 
distribution of Xi has a non-trivial absolutely continuous component (cf. also [Pr], [Se]). 
Now, let us turn to the entropic distance to normality. 

Theorem 1.2. Assume that have finite fourth absolute moments, and that D{Xk) < D 
il<k< n). Then 

D{Sn) < CU, (1.4) 

where C depends on D, only. 

In (1.2) and (1.4) one may take C = e<^+'^\ where c is an absolute constant. Moreover, 
C can be chosen to be just a numerical constant, provided that D is not too large, namely, if 
D < Co log and D < co log respectively (with cq > absolute). 

These Berry-Esseen-type estimates are consistent in view of the Pinsker-type inequality. 
In some sense, one may consider (1.4) as a stronger assertion than (1.2), which is indeed the 
case, when L4 is of order L3. (In general Lg < L4.) 

In the i.i.d. case as in (1.3), the inequality (1.4) becomes 

D{Sn) < -BXf, 
n 

where C depends on D{Xi) only. Thus, we obtain an error bound of order 0(l/n) under the 
4th moment assumption. Note that the property D{Sn) always holds under the second 
moment assumption (with finite entropy of Xi). This is the statement of the entropic central 
limit theorem, which is due to Barron [B]. Here, the convergence may have an arbitrarily 
slow rate. Nevertheless, the expected typical rate D{Sn) = O(^) was known to hold in some 
cases, for example, when Xi has a distribution satisfying an integro-differential inequality 
of Poincare-type. These results are due to Artstein, Ball, Barthe and Naor [A-B-B-N], and 
Barron and Johnson [B-J]; cf. also [J]. Recently, an exact asymptotic for D{Sn) has been 
studied in [B-C-Gl]. If the entropy and the 4th moment of Xi are finite, it was shown that 

D{Sr.) = ^ +o(-^) , c = 1 {EXff. 

n \n log n J 12 ^ 

Moreover, with finite 3rd absolute moment (and infinite 4th moment) such a relation may not 
hold, and it may happen that D{Sn) > n"^^/^"'"^) for all n large enough with a given prescribed 
£ > 0. This holds, for example, when Xi has density 

r+00 -I 

p{x)= / -J—e-^y^'^'dPia), 
Jo aylir 

where P is a probability measure on (^, +00) with density — = ((Tlogcj)^^ for a > e and 
with an arbitrary extension to the interval ^ < a < e satisfying Jy"^ a'^ dP{a) = 1. 

Therefore, in the general non-i.i.d.-case, the Lyapunov coefficient L3 cannot be taken as 
an appropriate quantity for bounding the error in Theorem 1.2, and L4 seems more relevant. 
This is also suggested by the result of [A-B-B-N] for the weighted sums 

Sn = aiXi H h ttnXn (a? H h = 1) 
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of i.i.d. random variables X^, such that EXi = and E,Xf = 1. Namely, it is proved there 
that 

°(^»' S -^^^0L_D(X,), (1.5) 

where L(a) = af + ■ ■ ■ + af^ and c > is an optimal constant in the Poincare-type inequality 
cYax{u{Xi)) < E But for the sequence akXk and s = 4, the corresponding Lyapunov 

coefficient is exactly L4 = L{a)'EXf. Therefore, when c = c{Xi) is positive, (1.5) yields the 
estimate 

2DiX^) 

which is of a similar nature as (1.4). 

Another interesting feature of (1.4) is that it may be connected with transportation cost 
inequalities for the distributions F„ of Sn in terms of the quadratic Wasserstein distance W2- 
For random variables X and Z with finite second moments and distributions Fx and Fz, this 
distance is defined by 

/+OO /• + 0O 
/ \x-yfdTrix,y), 
-00 J —00 

where the infimum is taken over all probability measures it on the plane with marginals Fx 
and Fz- The value VF|(Fx,-^z) is interpreted as the minimal expenses needed to transport 
Fz to Fx, provided that it costs \x — to move any "particle" x to any "particle" y. 

The metric W2 is of weak type in the sense that it can be used to metrize the weak 
convergence of probability distributions ([V]). Moreover, if Z is standard normal and if X has 
density, W2{Fx , Fz) may be bounded in terms of the relative entropy by virtue of Talagrand's 
transportation inequality 

Wi{Fx,Fz) < 2DiFx\\Fz) (1.6) 
(cf. [T] , or [B-G] for a different approach) . If additionally X has mean zero and unit variance, 
D{Fx\\Fz) = D{X). Hence, applying (1.6) with X = Sn, we get, by Theorem 1.2, 



W2{Fn,^)<CVL4, (1.7) 

where C depends on D. In fact, this inequality holds true with an absolute constant. This 
result is due to Rio [Ri], who also studied more general Wasserstein distances Wr, by relating 
them to Zolotarev's "ideal" metrics. It has also been noticed in [Ri] that the 4-th moment 
condition is essential, so the Laypunov's ratio L4 in (1.7) cannot be replaced with L3 including 
the i.i.d.-case (like in Theorem 1.2). 

The paper is organized according to the following plan. 
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2. General Bounds on Total Variation and Entropic Distance 

Let a random variable X have an absolutely continuous distribution F with density p(x) 
and finite first absolute moment. We do not require that it has mean zero and/or unit variance. 

First, wc recall an elementary bound for the total variation distance \\F — $||tv in terms 
of the characteristic function 

/+0O 
e'*^p{x)dx {teR). 
-oo 

Introduce the characteristic function g{t) = e"*^/^ of the standard normal law. 
In the sequel, we use the notation 




to denote the L^-norm of a measurable complex- valued function u on the real line (with respect 
to Lebesgue measure). 



Proposition 2.1. We have 

WF-Hty < l\\f-9\\l + l\\f'-9'\\l (2.1) 



This bound is standard (cf. e.g. [I-L], Lemma 1.3.1). In fact, the inequality (2.1) remains to 
hold for an arbitrary probability distribution (in place of $) with finite first absolute moment 
and characteristic function g. However, the general case won't be needed in the sequel. 

Note that the assumption E \X\ < +oo guarantees that / is continuously differentiable, so 
that the last integral in (2.1) makes sense. 

Let Z be a standard normal random variable, with density <^(x) = Consider 
the relative entropy 

D{X\\Z) = D{F\\<^) = r^p{x) Xog^dx. (2.2) 



As a preliminary bound, we first derive: 
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Lemma 2.2. For all T > 0, 

D{X\\Z) < e-^'/2 + V2^ j'^ {p{x) - ^{x)f e^'/2 dx 

+ - / x^p{x)dx+ I p{x)\ogp{x) dx. (2-3) 

2 J\x\>T J\x\>T 

Proof. We split the integral in (2.2) into the two regions. For the interval \x\ < T, using 
the elementary inequality tlogt < (t — 1) + (t — 1)^, f > 0, we have 

T rT / \ rT / 

P , P , I P -,\ 1 I P 



— log — ipdx < / 1 ] ipdx + 1 ] ipdx 

Tf f J-xyf J J-tK'P 



L 



\x\>T 



{ip-p)dx+ [ ^^—iL dx 

J-T f 

= 2(l-$(r))- / p{x)dx + V2^ [ {p{x)-<p{x)fe'''/^dx. 

J\x\>T J-T 

For the second region, just write 

p{x) log dx = / p{x)logp{x) dx 

J\x\>T 

+ log V^n I p{x) dx + - I x} p{x) dx. 

J\x\>T 2 Ju|>T 



It remains to collect these relations and use log v 27r < 1 together with a well-known elementary 
inequality 1 — $(r) < ^ Thus, Lemma 2.2 is proved. 

Remark. If p is bounded by a constant M, the estimate (2.3) yields 

D{X\\Z) < e-^'/2 + V2^y'^ (p(^) _ ^(a;))2 gxV2 

+ - I p{x) dx + log M I p{x) dx. 

2 J\x\>T J\x\>T 

This bound might be of interest in other applications, although it involves the maximum of 
the density. For our purposes, the important integral in (2.3), J^^^^rpp{x)logp{x) dx, will be 
bounded in a different way and in terms of the characteristic functions, without involving the 
parameter M. 

3. Entropic Distance and Edgeworth-type Approximation 

To estimate the integrals in (2.3) in terms of the characteristic functions like in Proposition 
2.1, define 

/ 

(Paix) = ^{x) il + a — — 
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where a is a parameter. These functions appear with a proportional to n in the Edgeworth- 




summands. In the non-i.i.d. case such expansions hold as well with 




Note that every has the Fourier transform 



9a{t)= I €^^'^^a{x)dx = g{t){l + a 





where Z is a standard normal random variable and f is the characteristic function of X. 

The assumption on the 3rd absolute moment is needed to insure that / has first three 
continuous derivatives. 

As a particular case, the inequality (3.1) is valid for a = 0, as well. Then it becomes 



which may be viewed as a full analog of Proposition 2.1. However, with properly chosen values 
of a, (3.1) may provide a much better asymptotic approximation (especially when applying it 
to the sums of independent random variables). 

Proof. We may assume that the characteristic function / and its first three derivatives 
are square integrable, so that the right-hand side of (3.1) is finite. Note that in this case, X 
has an absolutely continuous distribution with some density p. 

We apply Lemma 2.2. Given T > to be specified later on, let us start with the estimation 
of the last integral in (2.3). Define the even function p{x) = p{x) +p{—x), so that pXogp < 
plog'^p (where we use the notation a'^ = max{a, 0}). Subtracting (paix) from p{x) and then 
adding, one can write 



But the function (pa — (f is odd, so the last integral does not depend on a and is equal to 



'J\^\>T 

To estimate it from above, one may use Cauchy's inequality together with the elementary 
bound (log"*" t)'^ < Ct, where the optimal constant C is equal to Ae"'^. Since Jl!^^ p{x) dx = 2, 



D{X\\Z) < 4(||/-<7||2 + |ir-/'||2), 





(3.2) 
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/ 

J\x\ 



(3.2) does not exceed 

bV f / (log+p(x))^dx^ ' <i I ^{xfdo^ ' 

'|a;|>r / \J\x\>T J \J\A>T J ^ 

On the other hand, 

where we apphed the inequahty 1 — ^>(x) < | e~^^/^ {x > 0). Thus, using • ^^/a^ < 1 to 
simphfy the constant, we get 

f+OO 



/ 

J\x\ 



\p{x) — ifa{x)\ log~'"p(x) dx + e^^ 1"^ . 
Here, again by the Cauchy inequahty, the last integral does not exceed 

2 /O / /■+00 \ 1/2 0-/0 1 / f +00 \ 1/2 

^ =^--^(^J ^ \f{t)-g^{t)\'dtj , 

where we applied Plancherel's formula. The constant in front of the last integral is smaller 

timate 

p{x) \ogp{x) dx <l 11/ - 5a||2 + e-^'/\ (3.3) 



than |, so we arrive at the estimate 



/ 

J\x 



'\x\>T 

Now, let us turn to the pre-last integral in (2.3). Once more, subtracting ipa{x) from p{x) 
and then adding, one can write 

/ x^p{x)dx < I x^ \p{x) — (pa{x)\dx + I x^(pa{x)dx. 

J\x\>T J-oo J\^\>T 

Since the function (/?q, — (/? is odd, the last integral is equal to 

/ xM^) dx = ^ e--'/2 dx = 2(1 - $(r)) + ^ re-^'/2 

J\x\>T v27r Jt v27r 

(by direct integration by parts). Hence, using 2(1 — ^{T)) < e~'^^/^ once more, we get 
- / x^p{x)dx < X / x^\p{x) - ipoc{x)\dx 

^ ^|3;|>T ^ J-oo 

+ ]le-^'/' + ^Te-^'/\ (3.4) 

2 ^/2^ 

In addition, by Cauchy's inequality, 

(roo \ 2 /■+00 /■+00 

/ x"^ \p{x) - ipa{x)\ dx \ < / ^ / (1 + x"^) X'^ {p{x) - ipaix))"^ dx 

J —oo J J —oo 'T X J —oo 

/+0O 
{x^ + x^) {p{x) — (^a(2;))^ f^a^ 
-oo 

f + 00 



/ + 00 
(1 + 2x^) (p{x) - ^Poc{x)f dx. 
-oo 
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But, by Plancherel's formula 

f+OO 



/+0O J 

{p{x) - ipa{x)f dx = ^\\f-ga\\l (3.5) 

/+00 1 

^ x'{p{x)-^a{x)fdx = (3.6) 



Hence, 

/+0O / 1 \ 1/2 

^ x''\p{x)-^a{x)\dx < i^-\\f-gJl + \\r-g';:\\l^ 

< ii/-5aii2+iir-5;:'ii2, 

and from (3.4), 



2 7|a;|>T 2 A/27r 

+ ^ll/-5a||2 + ^|ir-5;:'l|2. (3.7) 

Using the bounds (3.3) and (3.7) in the inequahty (2.3), we therefore obtain that 
DiX\\Z) < ^e-^V2+ 1 re-^V2 



+ l\{x) - ^{x)f e-'/2 dx + 11/ - gah + \\f"' - 5a lb- (3.8) 

Next, let us consider the integral in (3.8). First, writing 

p{x) - (p{x) = {p{x) - (faix)) + a — ^1 — (p{x) 

and applying an elementary inequality (a + b)'^ < + 7- (a, f> G R, < t < 1) with t = 1/6, 
we get 

{p{x) - v{x)f < I {pix) - va{x)f + a' ^(^)2, 

or equivalently, 



Integrating this inequality over the interval [— T, T\ and using E [Z^ — 3Z)^ = 6, where Z 
N{0, 1), we obtain 

rT . a rT 



V2^ f {p{x)-ip{x)fe''^/^dx<-V2^f {p{x)-ipa{x)fe''^/'^dx 

J-T 5 J_T 



To estimate the last integral, first note that the function t — >■ e*/^/(2 + 1) is increasing for 
t > 0. Hence, for all |x| < T, 

p3;2/2 rV2 
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and thus, using (3.5)-(3.6), 

j^Jj>{x) - ^^{x)f /2 dx < jj2 + x^) {p{x) - dx 

< ^^2i\\f-9a\\l + \\r-9'X2)- 

Putting £ = 11/ - gah + Wf" - 9ah, we get 

i-T ,18 p'^V2 

V2^ J ^{p{x) - ^^{x)f /^dx<^ + a'. 

Inserting this inequahty in (3.8) leads to 

D{X\\Z) < ^e-^V2+ 1 re-TV2 + ^ e'^ + e + a''. (3.9) 

It remains to optimize this bound over all T > 0. As before, consider the function ipit) = 
e*/2/(2 + t). It is increasing for t > with V'(O) = ^. If < £ < 2, define T = to be the 
(unique) solution to the equation 

V'(r^) = p 

In this case, 

1 /2 1 

s 

so re~^^/2 < |. Furthemore, note that 



e 

e 2 + r2 - 2' 



1 /2 1 

' £ ' 2 + T2 - 2 ' 

so e~"^^/2 < |. Applying these bounds in (3.9), we arrive at 

which is exactly the desired inequality (3.1). 

In case £ > 2, let us return to (3.8) and apply it with T = 0. This yields 

D{X\\Z)<^+e<4e, 

which is even better than (3.1). Thus, Proposition 3.1 is proved. 

4. Quantile Density Decomposition 

In order to effectively apply Propositions 2.1 and 3.1, one has to solve two different tasks. 
The first one is to estimate integrals such as 

rT rT 



\f{t)-g^{t)\''dt, j ^\f"{t)-£{t)\^dt 



-T 

over sufficiently large i-intervals with properly chosen values of the parameter a. When the 
characteristic function / has a multiplicative structure, i.e., corresponds to the sum of a 
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large number of small independent summands, this task can be attacked by using classical 
Edgeworth-type expansions (for characteristic functions). Such expansions are well-known 
including the non-i.i.d. case, and we consider one of them in Section 12. 
The second task concerns an estimation of integrals such as 



which in general do not need to be small or even finite. The finiteness is guaranteed, for 
example, when / is the Fourier transform of a bounded density p. For some purposes such as 
obtaining local limit theorems, it is therefore natural to restrict oneself to the case of bounded 
densities. For other purposes, such as an estimation of the total variation or relative entropy, 
the density p may slightly be modified, so that the new density, say p, will be bounded, and 
at the same time will only slightly change the total variation distance or relative entropy with 
respect to the standard normal law. 

To this aim, we shall use the so-called quantile density decomposition, based on the following 
elementary observation. (In fact, it is needed in case of bounded densities, as well.) 

Proposition 4.1. Let X be a random variable with density p. Given < k < 1, the real 
line can be partitioned into two Borel sets Aq, Ai such thatp{x) < p{y), for all x G Aq, y G A\, 
and 



Here, m^^ represents a quantile (or one of the quantiles) for the function p viewed as a random 

variable on the probability space (R, p(x) dx). In other words, = mi^{p{X)) is a quantile 
of order k for the random variable p{X). If k = ^, the index is usually omitted, and then 
m = m{p{X)) denotes a median oip{X). 

Definition 4.2. Define the densities po a-nd pi to be the normalized restrictions of p to 
the sets Aq and A\, respectively. As a result, we have an equality 



which we call the quantile density decomposition for p (respectively - the median density 
decomposition, when k = 

Let us mention one obvious, but important property of the functionals mfi{p{^X)), assuming 
that X has a finite second moment. 





The argument is based on the continuity of the measure p{x) dx and is omitted. 
Clearly, for some real number we get 

Aq (Z {x eH: p{x) < m«;}, Ai C {x eH: p{x) > m^}. 



p{x) = Kpo{x) + (1 - k)pi{x) 



(4.1) 



Proposition 4.3. The functionals 

Q^{X) = m^{p{X))^YaT{X) 
are affine invariant. That is, for all a and b ^ 0, Qnio- + bX) = Qi^{X). 
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More precisely, one should either assume in the latter equality that the quantile mfi{p{X)) 
is determined uniquely, or to use specific quantiles satisfying the relation mn{pa,h{o, + hX)) = 
\h\~^ mi^{p{X)), where Pa,b denotes the density of the random variable a + hX. 

5. Properties of the Quantile Decomposition 

In this section we establish basic properties of the quantile density decomposition. Although 
for purposes of Theorems 1.1-1.2 the median decomposition is sufficient, the general case is no 
more difficult (but may be used to provide more freedom especially for improving L)-dependent 
constants) . 

First, let us bound from above the quantiles = mnipiX)) in terms of the entropic 
distance to normality. 

Proposition 5.1. Let X be a random variable with finite variance (o" > 0), having an 
absolutely continuous distribution, and let < k < 1. Then 

1 eW)+i)/M. 
C7v27r 

In particular, 

aV2Tr 



Proof. By Proposition 4.3, we may assume that X has mean zero and variance one. Let 
A = {x eH: p{x) > m„}. By the definition of the quantiles, 

/ p{x) dx > 1 — K. 
J A 

Since p{x) > on the set A, we have 

f pix) log (l + ^ii^ll^ dx > [ pix) log (l + -^7^^ dx 

J-oo V <^wy J A \ <p[x)j 

f Tfl 

> / pix) log— A- dx 

Ja nx) 

= log(mK;\/27r) / p{x) dx + - I x^p{x) dx 
J a 2 7^ 

> (1 - k) log(mK\/27r). 

On the other hand, using an elementary inequality t log(l + t) — tlogt < 1 (t > 0), we get 

< log ^ ^(x) dx + l = D{X) + 1. 

J-oo 'Pix) (fix) 

Hence, (1 — k) log(mK\/27r) < DiX) + 1, and the proposition follows. 



Entropic Bounds 



13 



Now, let Vo and Vi be random variables with densities po and pi from the quantile decom- 
position (4.1). They have means aj = 'EiVj and variances cr| = Var(Vj), connected by 

nao + (1 — /«) ai = EX, 

and 

[nal + (!-«) a?) + ('^'^o + (1 - ^l) = (5-1) 
provided that X has a finite second moment. 

The next step is to prove upper bounds for the entropies of Vq and Vi. 

Proposition 5.2. If X has mean zero and finite second moment, then 

kD{Vo) + (1 - k) D{Vi) < D{X) - Klog K - (1 - k) log(l - n). 
In particular, in case of the median decomposition, 

D{Vo) + D{Vi) < 2D{X) + 2 log 2. 

Proof. Let Var(X) = {a > 0). We may assume that D{X) is finite. By Definition 4.2, 

/ + 00 
Po{x)logpo{x)dx 
-oo 

= I {p{x) / K)\og{p{x) / k) dx = —\ogn-\ — / p{x)\ogp{x) dx, 

and similarly, —h{Vi) = — log(l — k) + J^^ p{x) logp(a;) dx. Adding the two equalities with 
weights, we get 

- Kh{Vo) - (1 - k) h{Vi) = -KlogK- (1 - k) log(l -k)- h{X). (5.2) 

Recall that 

D(Vo) = /i(Zo)-/i(Vb), where Zq ~ iV(ao, cr^), 
D{Vi) = h{Zi)-h{Vi), where Zi ~ 7V(ai,a?), 
D{X) = h{Z) - h{X), where Z~Ar(0,(72). 

Hence, from (5.2), 

kD{Vo) + {1- K)D{Vi) = Kh{ZQ) + {l- K)h{Zi) 

-Atlog k-{1-k) log(l + {D{X) - h{Z)) 
= Klog((To\/27re ) + (1 — k) log((Ti\/27re ) 

-K log K - (1 - k) log(l - k) + (i:)(X) - log(cr\/27re )) 

= -At log K - (1 - k) log(l - k) + D{X) + log ° ^ . 

a 

Finally, by (5.1), and the arithmetic-geometric inequality, 

agvf-")</tc7^ + (l-Ac)a?<a^ 
so, — < 1. Proposition 5.2 is proved. 
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Note that bounds on D{X) provide a quantitative measure of non-degeneracy of the dis- 
tributions of Vj via positivity of their variances (t| . 

Proposition 5.3. Let X be a random variable with mean zero and variance [a > 0), 
having finite entropy. Then 



Proof. By homogeneity with respect to a, one may assume that a = 1. 
We modify the argument from the proof of Proposition 5.1. First note that 

/ + 00 
po{x)logpo{x)dx 
-oo 

/+0O I" 
Po{x)logpo{x) dx = — {p{x) / k) log{p{x) / k) dx 
-oo J Aq 

1 f 

= logK / p(x) logp(x) dx, (5-3) 

where is a set from Definition 4.2. 

In order to estimate the last integral, put r{x) = ^ with parameter a > 0. Using the 
property r{x) < 1 and once more the inequality t log(l + t) < tlogt + 1 (t > 0), we get 



Ao 



/ + 00 / 
p{x) log I 1 



+ 00 



< 



+ 00 



oo 

+00 



p{x) 
r{x) 

p{x) 
r{x) 

p{x) Kx) _^ ^ 



p[x ) 
r{x) 



log 1 + 



dx 

r(x) dx 



r{x) dx 

2 z'+oo 



r{x) r{x) 

/+00 ^2 z'+oo r+oo 

p{x) logp(a;) dx + — p{x) x^ dx + I r{x) 
-oo ^ J —oo J —oo 

= L'(X)-log(V2^) + (^y + ^\/2^^. 

The right-hand side is minimized for a = (27r)^/^ in which case we obtain that 

/ p{x) \ogp{x) dx < D{X) - log(\/2^) + \ (27r)^/^ < D{X) + 1.35. 
JAo 2 

Together with (5.3), the above estimate yields 

log(aoV27re) > logK - - iD{X) + 1.35). 

K 

But log(V27re) ~ 1.42 < so log ctq >\ogK-\ {D{X) + 2.77), or equivalently, 

ao>Ke-(^W+2-77)/«. 



dx 
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Finally, using k> e ^/'^ , the above estimate may be simplified to 

which gives the first estimate on uq. The second estimate for ai is similar. 

Thus, Proposition 5.3 is proved. Note that in case of the median decomposition, it becomes 

where c is a positive absolute constant. One may take c = e~^, for example. 



6. Entropic Bounds for Cramer constants of Characteristic Functions 

If a random variable X has an absolutely continuous distribution with density, say p, then, 
by the Riemann-Lebesgue theorem, its characteristic function 

/+0O 
e'*^p(a;) dx {t G R) 
-oo 

satisfies /(t) ^ 0, as t ^ oo. Hence, for all T > 0, 

5x{T) = sup \f{t)\ < 1. 

\t\>T 

An important problem is how to quantify this separation property (that is, separation from 
1) by giving explicit upper bounds on the quantity SxiT), sometimes called Cramer constant. 
(At least SxiT) < 1 is refered to as Cramer's condition (C)). This problem arises naturally 
in local limit theorems for densities of the sums of non-identically distributed independent 
summands. Furthermore, it appears in the study of bounds and rates of convergence in the 
central limit theorem for strong metrics including the total variation and relative entropy. For 
our purposes, it is desirable to bound Sx{T) explicitly in terms of the entropy of X or, what 
is more relevant, in terms of the entropic distance to normality D{X). Thus, this quantity 
controls separation of the distribution of X from the class of discrete measures on the line. 

A preliminary answer may be given in terms of the variance = Var(X), when it is finite, 
and in cases where the density p is uniformly bounded. 



Proposition 6.1. Assume p{x) < M a.e. Then, for all t real, 



, M minjl, a'^t'^} 

\m\ < i-c l^^^, \ (6.1) 



where c > is an absolute constant. 



In a slightly different form, this bound was obtained in the mid 1960's by Statulevicius 
[St]. He also considered more complicated quantities reflecting the behavior of the density p 
on non-overlapping intervals of the real line. 

The inequality (6.1) can be generalized by involving non-bounded densities, but then M 
should be replaced by other quantites such as quantiles = mK{p{X)) of the random variable 
p{X). One can also remove any assumption on the moments of X by replacing the standard 
deviation by the quantiles of the random variable X — X', where X' is an independent copy 
of X. We refer to [B-C-G2] for details, where the following bound is derived. 
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Proposition 6.2. Let X be a random variable with finite variance and finite entropy. 
Then, for all t real, 

\f{t)\ < l-cmin{l,(72f2}e-^-^W, (6.2) 
where c> is an absolute constant. 

At the expense of a worse constant in the exponent, this bound can be derived directly 
from (6.1) by combining it with Propositions 5.1 and 5.3. 

Indeed, we may assume that EX = 0. Let Vo and Vi be random variables with densities 
Pq and pi from the median decomposition (4.1), that is, for k = ^, and denote by /o and /i 
the corresponding characteristic functions, so that / = 5/0 + 5/1- Hence, for all t, 

1/(01 < ^l/o(*)l + ^- (6-3) 

Since po is bounded - more precisely, po{x) <m = m{p{X)), one can apply Proposition 6.1 to 
the random variable Vq with M = m. Then (6.1) and (6.3) give 

where (Tq = Var(Vb) and c > is an absolute constant. Note that ctq < 2cr^, according to (5.1). 
Now, by Proposition 5.1, 

1 

TT 

Hence, 

1/(01 < 1-ci min{l,a^t2}e-4i?W. 
Finally, by Propositions 5.3, cJq > C2a'^ e~^^^-^\ so 

\f{t)\ < I-C3 min{l,a2i2|g-8D(x) 

with some absolute constants Cj > 0. 

7. Repacking of Summands 

We now consider a sequence of independent (not necessarily identically distributed) random 
variables Xi , . . . , X^ and their sum = Xi + --- + Xn. Let EXk = 0, EX| = (ak > 0). 
One may always assume without loss of generality that af + --- + a'^ = 1, so that Var(S'„) = 1. 

In addition, all X^ are assumed to have absolutely continuous distributions, having finite 
entropies in each place, where the functional D is used. 

To study integrability properties of the characteristic function /„ of Sn (more precisely 
- of its slightly modified variants fn), it will be more convenient to work with a different 
representation, 

Sn = Vi + --- + VN, 

where the new independent summands represent appropriate partial sums of the Xi resulting in 
almost equal variances, such that at the same time the number of blocks, N, is still reasonably 
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large. Such a representation may be introduced just by taking 

nk-i<l<nk 

where Uq = and Uk = max{ I < n : af + ■ ■ ■ + af < jj}. 

The number of new summands is restricted in terms of the parameter 

a = max ai 
I ' 

which in general may be an arbitrary real number between and 1. 

Lemma 7.1. If N < then for each k = 1, . . . ,N , 

2^ < Var(y.) < 1. (7.2) 

Proof. If ni = n, then necessarily N = 1 and Vi = Sn, so (7.2) holds immediately. 

If ni < n, then, by the definition, Var(Vi) < and Var(Fi + X„^+i) > The latter 
implies Var(Vi) > — o"^ > jj^, thus proving (7.2) for k = 1. 

Now, let 2 < A; < A^. Again by the definition, Y&i{SnJ < and Var(5nfe_i+i) > The 
latter implies Var(S'nj._i) > — cr^. Combining the two bounds, we get 

Var(F,) = Var(5„J - Var(5„,_J < A _ (^^zl _ ^ ^ + ^2 < |_ 

On the other hand, 



, , f k ^\ k-l 1 , 1 
Var Vfc > ctM = > . 

Lemma 7.1 is proved. 

Thus, to obtain the property (7.2), it seems suggestive to take N = [2^] (the integer part). 
However, this choice is not used in the proof of Theorems 1.1-1.2, since we need to express N 
as a suitable function of Lyapunov's coefficients. 

As another useful property of the representation (7.1), let us mention the following. 

Lemma 7.2. //max;<„ L)(X;) < D, then maxfc<Ar -D(Vfc) < D, as well. 

This is due to the general bound D{X + Y) < max{D(X), D(Y)}, which holds for ar- 
bitrary independent random variables with finite second moments and absolutely continuous 
distributions. It can easily be derived, for example, from the entropy power inequality 

^2h{X+Y) > ^2h{X) _,_ ^2h{Y)^ 

cf. [C-D-T]. 

Now, let pk denote density of the random variable Vk- For each pk, one may consider a 
median density decomposition 

Pk (x) = ^ Pko {x) + ^ pki {x) (7.3) 
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in accordance with Definition 4.2 for the parameter k = ^. 

In particular, pkoix) < m, where m = m^pkiYk)) is a median of the random variable Pk{Vk)- 
Note that by Proposition 5.1 with X = Vk and Lemmas 7.1-7.2, if maxj<„ < D, we 

immediately obtain that 

mipkiV,)) < ^^^2^+' < VNe'^^', (7.4) 



where Vk = yVar(Vfc)- 

Let Vkj be random variables with densities pkj and characteristic functions 

/+00 
e''-'pkj{x)dx, j = 0,l. 

We collect their basic properties in the following lemma. 

Lemma 7.3. Assume that N < ^ and max;<„ D{Xi) < D. For all k < N and j = 0, 1, 

a) D{Vkj)<2D + 2, 

6) Var(F,,)> 2^^6-4(^+4), 

c) |/5fej(t)| < 1 — ce~^^^ for all \t\ > s/N with an absolute constant c > 0. 

Proof. The first assertion follows from Lemma 7.2 and Proposition 5.2 applied with X = 
Vk- For the second one, combine Proposition 5.3 with X = Vk and Lemmas 7.1-7.2 to get 

where v'^j = Var(Vfej) (vkj > 0). For the assertion in c), combine Proposition 6.2 for X = Vkj 
and the previous steps, which give 

\pkj{t)\ < l-cmin{l,t;i/}e-4^(^'=^) 

< 1 - c min{l,tV(2iV)} e-4(^+4)e-4(2^+2) 

< 1-c' min{l,iV^}e"^^^ 
with some absolute constants c, c' > 0. 

8. Decomposition of Convolutions 

Starting from the representation Sn = Vi + ■ ■ ■ + Vn with the summands defined in (7.1), 
one can write the density of Sn as the convolution 

Pn = Pi* ■ ■ ■ * PN, 

where pk denotes the density of Vk- Moreover, a direct application of the median decomposition 
(7.3) leads to the representation 

Pn = 2-^ (Pio * Pn'') * • • • * A * PNin, 
where the summation is carried out over all 2^ sequences Sk with values and 1. 
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Let an integer number niQ > be given (For our purposes, one may take mo = 3). For 
A?^ > mo + 1, we split the above sum into the two parts, so that 



where 



Pn = QnO + Qnl, 

quo = 2-^ J2 (Pio * Pn'') * • • • * (A * Pmn, 

<5iH l-5jv>mo 

qm = 2-^ Yl (P'o * Pu'') * " " " * (Pno * Pm^- 

5iH |-5Ar<mo 



Put 



/+00 "^0 

qni{x)dx = 2~^y^ 

-OO r. 

One can easily see that 



AT! 



£n < 2-(^-i) Ar"»o. (8.1) 

Definition 8.1. Put 

Pn{x) = Pno{x) = — Qnoix), (8.2) 

and similarly Pni{x) = ^ Qniix)- Thus, we get the decomposition 

Pn{x) = (1 - £n)Pno{x) + EnPnlix). (8.3) 

Accordingly, introduce the associated characteristic functions 

/+00 r+oo 
e'*''Pn{x)dx, f^i{t)= I e'^''pnoix)dx. 
-OO J — OO 



The probability densities Pn{x) = Pno{x) are bounded and provide a strong approximation 
for Pn{x). Indeed, from (8.3) it follows that 

\Pn{x) -Pn{x)\ = £n\Pno{x) - Pnl{x)\ (8.4) 

which together with the bound (8.1) immediately implies: 
Proposition 8.2. For all n > N > mo + 1, 

f + 0O 

\pn{x)-pn{x)\dx < 



/ 



In particular, the corresponding characteristic functions satisfy, for all i G R, 

\fnit)-fn{t)\ < 2-(^-2)iV-o. 

We need a similar inequality for derivatives of characteristic functions. To this aim, we 
shall use absolute moments E and the associated Lyapunov ratios 



k=l 
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Let Vkj (1 < fc < -/V, j = 0, 1) be independent random variables with respective densities 
Pkj from the median decomposition (7.3) for the random variables Vk- For each sequence 
S = iSk)i<k<N with values and 1, the convolution 

represents the density of the sum 

N 

S{S) = Y,5kVko + {l-6k)Vki. 

k=l 

If all moments E are finite, (7.3) yields 

E\Vk\' = ^E\Vko\' + lE\Vki\'. (8.5) 
Hence, for the L*-norm ||S'((5)||s = (E \S{6)\^)^^^, using the Minkowski inequality, we have 

N 

\\S{S)\\s < 5]||«o + (l- Will. 

k=l 

N N 

< E('^fcll^fcoll^ + (l-'^fc)ll^fcill-) ^ 2V^Ell^fcll- 

k=l k=l 

where (8.5) was used in the last step. But 

. N N / 1 ^ \ l/s 

^Era^ = ^E(Ei^.r)^'^-<(^EEi^.r) , 

k=l k=l ^ k=l ^ 

SO 

N 



E|5(5)|" < 2A^"-i^E|Ffc|" < 2A^^E|5n|^ 



fc=i 



where we used E |Vfc|'' < E (due to Jensen's inequality). 

Write E |S'((5)|'' = ^^'^ \x\^ p^^\x) dx. Recalling the definition of Qnj and e„, we get 

/+0O 
\x\'qno{x)dx = 2-^ Yl E|5(<5)r < 2E|5nr(l-e„)A^^ 

5i+-+5jv>mo 

/+0O 
\x\'qni{x)dx = 2-^ Yl E|5(<5)|^ < 2B\Sn\'enN'. 
-OO I I , s ^™ 



(5i+-+(5jv<mo 

Hence, by the definition of Pno, 



/+0O 
\x\'pno{x)dx < 2E|5„|^iV^ 
-OO 



and similarly for But, from (8.4), 

\Pn{x) - Pn{x)\ < £nkr {Pno{x) +Pnl{x)), 
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SO, applying (8.1), 

/ + 00 
\x\'\pn{x)-pn{x)\dx < E|5„|*2-(^-3)A^'"o+^ 
-oo 

On the other hand, Rosenthal's inequality (cf. e.g. [Ro], [P-U]) gives 

E|-S„|^<c/l + ^E|X,f') =(7,(1 + L,), s>2, (8.6) 
V j=i / 

with some constants Cg, depending on s, only (where the assumption ES"^ = 1 is used). Note 
that in case 1 < s < 2, there is also an obvious bound E \Sn\^ < 1. 

One may summarize, using the constant Cg in Rosenthal's inequality (8.6). 

Proposition 8.3. Assume that Lg is finite (s > 2). For all n > N > mo + 1, 

/+0O 
\x\'\Pn{x)-pn{x)\dx < C,(l + L,) 2"^ iV"*0+^ 
-oo 

In particular, if s is an integer, the s-th derivative of the corresponding characteristic functions 
satisfies, for all t real, 

\fi'\t) - f(f\t)\ < C,(l + L,)2-MiV-o+^ 

For s = 1 and s = 2, it is better to use 'E,\Sn\ < 1 and ES^ = 1 instead of (8.6). For s = 3, 
Rosenthal's inequality can be shown to hold with constant C3 = 2. Hence, we obtain: 

Corollary 8.4. For all n > N > niQ + 1 and t G R, 

\f!f\t) - fjf\t)\ < N-^o+s ^ 1^ 2). 

Moreover, if L3 is finite, 

\fn{t)-fnit)\ < (l + L3)2-(^-^)iV-o+3. 



9. Entropic Approximation of p„ by pn 

As before, let be independent random variables with EX^ = 0, EX| = 

{ak > 0), such that af-\ \-a^ = 1. Moreover, let have absolutely continuous distributions 

with finite entropies, and let p„ denote the density of the sum 

Sn = Xi + • • • + Xji ■ 

Put cr^ = max-k cr|. 

The next step is to extend the assertion of Propositions 8.2-8.3 to relative entropies, with 
respect to the standard normal distribution on the real line with density 
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Thus put 



Dn= Pn[X) log— — dx, Dn= I Pn{x) Xog dx. 



Recall that the modified densities pn arc constructed in Definition 8.1 with arbitrary integers 
< mo < < ra on the basis of the representation (7.1), based on the independent random 
variables Vk and the median decomposition (7.3) for the densities pk of Vk- 

Proposition 9.1. Let D = maxj^ D{Xk). Given that mo + 1 < N < we have 

\Dn - Dn\ < 2-(^-6) iV"*o+^ [D + 1). (9.1) 



We shall use a few elementary properties of the convex function L{u) = nlogn ("u > 0). 



Lemma 9.2. For all u,v > and < e < 1, 

a) L((l -e)u + ev) <{1- e) L{u) + eL{v); 

h) L((l -e)u + ev)>{l- e) L{u) + eL{v) + uL{l - e) + vL{£). 



Proof of Proposition 9.1. Define 




(j = 0, 1), 



so that Dn = Dno, where the densities Pnj have been defined in (8.2)-(8.3). 

By Lemma 9.2 a), D„ < (1 — £n)DnO + SnDni- On the other hand, by Lemma 9.2 b), 

Dn > ((1 -£n)-D„0 +£n-D„l) + log + (1 - £„) log(l - £„). 

The two estimates give 

\Dn - Dn\ < SniDnO + D-ni) - £n log £„ - (1 - £n) log(l - £„). (9.2) 

Hence, we need to give appropriate bounds on both Dno and -Dni- 

To this aim, as before, let Vkj {I < k < N , j = 0, 1) be independent random variables with 
respective densities p^j from the median decomposition (7.3) for V^. By Definition 4.2, we 
have the identity (5.1), which for I4 reads 

4 = Q 4o + I + Q 4o + \ ^fci) > 
where aj-j = EV^j, v^j = Var(Vfej) and = Var(T4). Using Lemma 7.1, this implies 

vlo < 2vl < 1, vl < 2vl < 1. (9.3) 

As in the previous section, for each sequence 6 = {Sk)i<k<N with values and 1, consider 
the convolution 
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i.e., the densities of the random variables 

N 

k=l 

By convexity of the function u log u, 

1 ,_;v V- js),^,,^p^'Hx) 



Dnl < —2 

(5iH l-SN<mo 



p(^)(x)log^^dx, (9.4) 

i^„o<T^2-^ E / p(^Hx)log^-^dx. (9.5) 

Furthermore, if 5" denotes a random variable with variance v'^ {v > 0) having density p, 
and if Z is a standard normal random variable, the relative entropy of S with respect to Z is 
connected with the entropic distance to normality D{S) by the simple formula 

D{S\\Z) = [ p{x) log P^dx = D{S) + log - + ^^fl. (9.6) 

In the case S = S{6), applying Lemma 7.3 6), we have 

k=l 

hence 

logi < 2D + 9. (9.7) 

V 

In addition, arguing as in the proof of Proposition 8.2, specialized to the particular case s = 2, 
and applying (9.3), we get 

TV 

\\S{6)\\2 < 5^||(^fcl4o + (l-5fe)Ffci||2 

k=l 

N N 
k=l k=l 

Hence, ES{d)'^ < AN. Combining this estimate with (9.7), we get that 

logi + ^^^i^!^ < (2L> + 9) + 2iV. 

Consequently, if we apply this bound in (9.6) with S = S{5), we obtain 

D{S{5)\\Z) < D{S{5)) + {2D + 9) + 2N. (9.8) 

The remaining term, D{S{5)), can be estimated by virtue of the same general inequality 
D{X+Y) < max{£)(X), £)(y)} mentioned before. This bound can be applied to all summands 
of -5(5), which together with Lemma 7.3 a) gives 

D{S{5)) < max max{D{Vko), D{Vki)} <2D + 2. 

l<k<N 
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Applying this result in (9.8), wc arrive at 

p^^\x) \og(--^dx = D{Si6)\\Z) <iD + n + 2N. 

-oo 

Finally, by (9.4)- (9.5), we have similar bounds for Dno and namely, 
Dno <4D + n + 27V, Dni < 4L> + 11 + 2N. 
Having obtained these estimates, we are prepared to return to (9.2), which thus gives 

\Dn - Dn\ < 2e„ {4D + 11 + 2iV) + e„ log — + (1 - e„) log (9.9) 

To simplify this bound, consider the function i?(e) = elog^ + (1 — £)logj^, which is 
defined for < e < 1, is concave and symmetric about the point ^, where it attains its 
maximum H{^) = log 2. RecaU (8.1), that is, Sn < dn = 2'^^''^^ N""" _ 

If dn > 5, then 

H{sn) < log 2 < 2dn = 2-(^-2) N""> . (9.10) 

Note that 

1 1 
log — = mo log — + (AT - 1) log 2 < AT. 

Un » 

Hence, in the other case (i„ < ^, we have 

H{Sn) < H{dn) < 2dn log < 2-^^-2) N^0+\ (g.n) 

Un 

Comparing (9.10) and (9.11), we see that they can be combined to the following estimate 

H{en) < 2" (^-2) iV'^o+i^ 

which is valid regardless of whether d„ is greater or smaller than ^. 
Using this estimate in (9.9), we finally get 

\Dn-Dn\ < 2-(^-2)Ar"^o(4D + ll + 2A) + 2-(^-2)Ar"^o+i 

= 2-(^-2) A'"«(4Z) + 11 + 3A). 

Since 4D + 11 + 3N < 2^ N(D + 1), we arrive at the desired inequality (9.1). 
Thus, Proposition 9.1 is proved. 

10. Integrability of Characteristic Functions /„ and their Derivatives 

Now we turn to the question of quantitative bounds for the modified characteristic functions 
/n in terms of the maximal entropic distance to normality 

D = maxZ)(Xfe). 

k<n 

Again, let Xi, . . . , Xn be independent random variables with EXj. = 0, EA| = {a^ > 0), 
such that fjf + • • • + (T^ = 1. Moreover, all Xk are assumed to have absolutely continuous 
distributions with finite entropies. 

We assume that the modified density Pn and its characteristic function /„ have been con- 
structed for arbitrary integers mo -|- 1 < A" < n. Put a = maxj. a^. 
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Proposition 10.1. If mo > 1 and mo + 1 < N < then 

\fn{t)\^dt<C^fNe-''^ (10.1) 



/ 

J\t\ 



l\t\>VN 

with some positive constants C and c, depending on D, only. 

In fact, one can choose the constants to be of the form C = e^^^^ and c = coe~^^^, where 
Co is a positive absolute factor. 

Proof. Consider any convolution 

P=(P?o*Pn'^)*---*(P^"o*p]v"/") 
participating in the definition of g„0) that is, with (5i + • • • + 5iv > mo- It has the Fourier 
transform 

/ + 00 ^ 
e^*V(^) Ax = Y^ pkoit)^>^ Pki{tf-^\ (10.2) 
-'^ k=\ 

where p^j denote the characteristic functions of the random variables Vkj from the median 
decomposition (4.1) with X = (1 < A; < A^, j = 0, 1). In every such convolution there are 
at least mo + 1 terms p^o for which 5^ = 1- For definiteness, let = iV be one of them, so that 
(^TV = 1- Then, we may write 

N-l 

m = PNoit) n MtY' Pkiitf-^'- (10.3) 

k=l 



By Lemma 7.3 c), for all \t\ > VN, 

|/5,.,(t)| <exp{ -coe-^'^} (10.4) 

with some absolute constant cq > 0. Inserting this in (10.3) and using N >2 leads to 

\p{tf < A\pm{t)\^ ^ = exp{-coe-^2D^|^ ^^q^^ 

where cq > is a different absolute constant. 

Now, integrate (10.5) over the region \t\ > ^/N and use Plancherel's formula. Applying the 
property pm{x) <m = m{pN{VN)), we get 

/ \p{t)\'^dt<A \pm{t)\'^ dt = 2-kA pm{xf dx < 27r Am. (10.6) 

But, as noted in (7.4), we have m < e^^'^^s/N, so together with 27r < (10.6) gives the 
desired bound 

/ \m\^ dt < e2^+4 ViV e-'^^ ( c = ) 

J\t\>VN 

for p. But /„ is a finite convex combination of such functions, so (10.1) immediately follows. 
Thus Proposition 10.1 is proved. 



Next, we shall extend Propositions 10.1 to the derivatives of /„, which are needed up to 
order s = 3 in case of finite 4-th moments of Xk. Assume that s > 1 is an arbitrary integer. 
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Consider the characteristic functions p in (10.2). Recall that /„ represents a convex 
combination of such characteristic functions over all sequences 5 = {Si, ... ,6^) such that 
5i + ■ ■ ■ + 6n > mo + 1. Hence, it will be sufficient to derive an estimate, such as (10.1), for 
any admissible fixed sequence 5. 

Put 

^k = &pl-,''' {l<k<N), 
which is the characteristic function of the random variable ShVho + (1 — Sk) Vki. 

Thus, p = rijfcLi Uk- For the s-th derivative of the product we write a general polynomial 
formula 

^ Vsi . . . SnJ 

where the summation runs over all integer numbers si, . . . , sat > 0, such that si + -- - + SAr = s. 

Fix such a sequence si,...,sn. Note that it contains at most s non-zero terms. The 
sequence S = {Si, . . . ,Sn) defining p satisfies Si + ■ ■ ■ + Sn > mo + 1. Hence, in the row 
uf^\ . . . , uj^^^ there are at least mo + 1 terms corresponding to = 1. Therefore, if mo > s, 
there is at least one index, say k, for which 3^ = 1 and in addition = 0. For definiteness, 
let = iV, so that 

ip = ...Uj^ — PnqUi ••■?ijv-i • l-LU-O 

If Sk > 0, then 

l4''^(t)| <E|<5,,y,,o + (l-5fe)Vfcir'= <max{E|yfcorNE|Vfcir'=}. 
But, by the decomposition (7.3) and Jensen's inequality, 

^ElFfeor + ^E|T4ir =E|Ffer <E|5„r\ 

so |4''^(*)l < 2E|5„|^^ Hence, 

n \^k'\t)\ < 2' n ^\Sn\'' < 2^ n (E|S„r)"=/' = 2^E|5„|^ (10.8) 

Sfc>0 sfc>0 Sk>0 

When Sk = 0, we apply the estimate (10.4) on Cramer's constants, which may be used in 
(10.7). Note that (10.4) is fulfilled for at least {N - 1) - {s - 1) > N - mo indices k<N-l. 
Hence, using also (10.8), we get 

imi < C \pm{t)\ exp { - co{N - mo) e-'^""}, C = 2'B 

In case N > 2mo, one may simplify this bound by writing N — mo > ^. In addition, since 
the sum of the multinomial coefficients in the representation of p^*-* is equal to N^, and using 
Jensen's inequality for the quadratic function, we arrive at 

\p^'Ht)\^< A\pm{t)\', A = CN'exp{-coe-'^''N}, 

with some absolute constant cq > 0. It remains to integrate this inequality like in (10.6) over 
the region \t\ > Vn and apply the estimate (7.4). As a result, we obtain 

f |p(^)(t)|2di<^e2^+^ViV. 

J\t\>VN 
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Since is a convex combination of the functions p'^^\ a similar inequality holds for /n(i), 
as well. That is, 



/ 

J\t 



\fr:\t)\^dt < 2^E|5„|^e2^+4exp{-coe-i2^iV}iV^+i/2. 

\t\>VN 

For s = 1 and s = 2, we have E|S'„|* < 1, while for s > 3, one may use Rosenthal's 
inequality (8.6). In particular, for s = 3 it gives E \Sn\^ < 2(1 + L3). 
Summarizing the results obtained so far, we have: 

Proposition 10.2. Let mo > 3 and 2mo < N < Then 

l'\t)fdt<CN'+^/^e-''^ (s = l,2) (10.9) 



/ 

J\t\ 



l\t\>VN 

with positive constants C and c, depending on D, only. Moreover, if Lg is finite, s>2> integer, 
and mo > s, then 

'\t)f dt < C ■ Cs{l + Ls) N'+^f^ e~^^ 



j 

J\t 



\t\>VN 

Here, the constants C = e^^"*"^ and c = coe~^^^ are of the same form as in Proposition 
10.1, and Cg is a constant in Rosenthal's inequality (8.6). In particular, for s = 3, we arrive at 



/ 

J\t\ 



|/;"(t)P dt < C(l + L3) e-civ. (10.10) 

'\t\>VN 

Note also that, for s = 0, (10.9) is true, as well, and returns us to Proposition 10.1. 



11. Proof of Theorem 1.1 and its Refinement 

We are now ready to complete the proof of Theorems 1.1-1.2 and emphasize some of their 
refinements. Thus, let Xi, . . . , Xn be independent random variables with mean zero and finite 
third absolute moments, having finite entropies, and such that the sum = Xi + • • • + 
has variance Var(iS'n) = 1. 

Our main quantities are the Lyapunov coefficient 

n 

L3 = ^E|Xfe|3 

k=l 

and the maximal entropic distance to normality D = max^ D{Xf:). 

To bound the total variation distance ||_F„ — $||tv from the distribution of Sn to the 
standard normal law one may apply the general bound (2.1) of Proposition 2.1. However, 
it is only applicable when the characteristic function of Sn and its derivative are square 
integrable. But even in the case that, for example, each density p„ of Sn is bounded individ- 
ually, we still could not properly bound the maximum of the convolutions of these densities 
explicitly in terms of D and L3. That is why, we are forced to consider modified forms of Pn- 

Thus, consider these modifications Pn together with their Fourier transforms /„ described 
in Definition 8.1. By the triangle inequality, 

\\Fn - HtV < \\Fn - $||tV + ||^n - i^nHxV, (H-I) 
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where F„ denotes the distribution with density Pn- 

In the construction of p„ it suffices to take the values mo = 3 and 6 < N < ^ . Then, by 
Proposition 8.2, 

/ + 0O 
\Pn{x) - Pn{x)\ dx < 2-(^-2) N^. (11.2) 
-oo 

This gives a sufficently good bound on the last term in (11.1), if N is sufficiently large. 

The first term on the right-hand side of (11.1) can be bounded by virtue of (2.1), which 
gives 

\\Fn-n^TV < ^Il7n-5lli + ^ll(/n)'-5'll2, (11-3) 

where g{t) = e~*^/^. To estimate the L^-norms, first write 

l\\fn-g\\l < If ^\fn{t)-g{t)\'dt 

^ ^ J\t\<VN 

+ / |7n(t)pdt+ / g{tfdt. 

J\t\>VN J\t\>VN 

Since \ fn{t) - fn{t)\ < 2-(^-2) N^, we have 

If \In{t)-g{t)fdt < [ \fn{t)-fn{t)fdt+[ \fn{t) - g{t)f dt 

^ J\t\<VN J\t\<VN J\t\<^/N 

< [ \fn{t)-g{t)\'dt + 2-(^^-')Ny\ (11.4) 

J\t\<VN 

In addition, by Proposition 10.1, 



/ 

J\t 



\fn{t)\^ dt < C^/N e-'''' (11.5) 

\t\>VN 

with C = e^^^^ and c = cqc^^^^, where cq is an absolute positive constant. 

Using a well-known bound 1 — ^{x) < ^ <^(a;) {x > 0), we easily get J^^^^ 5'(*)^ dt < . 

Together with (11.4)-(11.5), and since one may always assume that co < ^, the latter gives 

^||7n-5lli< / |/n(t)-ff(t)pdt + CViVe-^^ (11.6) 

with £)-dependent constants C = Cqc^^ and c = coe~^^^ (where Co and cq are numerical). 

A similar analysis based on the application of Proposition 8.3 (cf. Corollary 8.4) and 
Proposition 10.2 with s = 1 leads to an analogous estimate 



\\\{fn)' -9'\\l< [ ^\m-9'{t)\'dt + CN'/'e- 



-cN 

_ \Jn{'') - y v-Ji -I- '-^-'^ ■ ^ 

Together with (11.6) it may be applied in (11.3), and then we get 



\Fn -n^TY < I \fn{t)-g{t)fdt 

J\t\<VN 

+ I \m-g'{t)\^dt + CN^I^e 

J\t\<VN 



Entropic Bounds 29 

It is time to appeal to the classical theorem on the approximation of fn by the characteristic 
function of the standard normal law, cf. e.g. [R-RR] . 

— 1/3 

Lemma 11.1. Assume L3 < 1. Up to an absolute constant A, in the interval \t\ < L3 
we have 

\fn{t)-9{t)\<ALse-''/\ 
and similarly for the first three derivatives of fn — g- 

In fact, the above inequality holds in the larger interval \t\ < l/^iL^). But this will not be 
needed for the present formulation of Theorem 1.1. 

Thus, if in addition to the original condition 6 < A" < ^ we require that y/N < L3 
Lemma 11.1 may be applied, and we get 

ll^n - *||tV < AL3 + CiV^/^ g-cJV_ 

Using this together with (11.2) in (11.1), we arrive at 

\\Fn ~ $||tv < AL3 + CN'^/'^ e-''^, (11.7) 
where A is some positive absolute constant, while C = Cqc^^ and c = cqc"^^^, as before. 



Proof of Theorem 1.1. To finish the argument, we may take N = [^L^^^^], so that 

7 < L3 In view of the elementary bound a < l]J^ , the condition < is fulfilled, as 
well. Finally, the condition > 6 just restricts us to smaller values of L3, and, for example, 
< would work. Indeed, in this case, ^ L3 > 8, so A^ > 8. 

Thus, if L3 < then (11.7) holds true. But since A^ > ^L^"^^^, the last term in (11.7) is 
dominated by any power of L3 (up to constants). For example, using > cix^ (x > 0), we 
get 

^3/2 g-civ < 1 ^-3/2 < 8 ^3 ^ 8 g36i? 
cicy cicy c\Cq 

Hence, (11.7) implies 

||i^n-$||TV<CL3, (11.8) 

with C = Coe'-^i^, where Cq,C\ are positive numerical constants. 
Finally, if L3 > ^, (11.8) automatically holds with C = 128. 
Thus, Theorem 1.1 is proved. 

Note, however, that the inequality (11.7) contains more information in comparison with 
Theorem 1.1. Again assume, as above, that L3 < and take A^ = L3 ^^^]. If D < ^ log^, 
then cN > coLg^^ • I L~'^'^ = d^L^^'^ and C = Cqc^^ < CqL'^'^"^ . Hence, 

with some absolute constant Cq. As a result, (11.7) yields — $||tv < {A + Cq) L3. If 
L3 > ^, (11.8) holds with C = 128, and we arrive at: 
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Theorem 11.2. Assume that independent random variables have mean zero and finite 
third absolute moments. If they satisfy D{Xk) < c log < k < n), then 

\\Fn - $||tv < CL3, (11.9) 
where C and c are positive absolute constants. {One may take c = ^). 

12. Proof of Theorem 1.2 and its Refinement 

In the proof of Theorem 1.2, we apply the general bound (3.1) of Proposition 3.1 to the 
modified densities pn constructed under the same constraints mo = 3 and 6 < AT < as in 
the proof of Theorem 1.1. It then gives 

Dn < «'+4(||7„-5a||2 + ||(/n)'"-5aL)> 

where Dn is the relative entropy of Fn with respect to ^> and 

9a{t) = 9{t) (l + a M!^ , « = ^ EX|. 
^ ■ ^ k=l 



As we know from Proposition 9.1, Dn provides a good approximation for the entropic 
distance Dn = D{Sn), namely 

\Dn - Dn\ < 2-(^-6) {D + 1). 

Hence, 

Dn < a' + ^{\\fn-9a\\, + \\ifny''-9'::i)+2-^''-'^N^iD + l). (12.1) 
On the other hand, the closeness of /„ and ga on relatively large intervals is provided by: 

Lemma 12.1. Assume L4 < 1. Up to an absolute constant A, in the interval \t\ < L^^^^ 
we have 

\fn{t)-gait)\<AL^e-'"/\ (12.2) 
and similarly for the first four derivatives of fn — ga- 

Again, we refer to [BR-R], where one can find several variants of such bounds. 
We also use the following elementary relations, cf. e.g. [Pe]. 

Lemma 12.2. < < L4. 

Now, assume that L4 < 1. To estimate the L^-norms in (12.1), again write 



ll/n-5a||2 < / \fn{t) - ga{t)\'' dt 

J\t\<VN 

+ 2/ \fn{t)\^dt + 2 f \ga{t)\^dt. (12.3) 

J\t\>VN J\t\>VN 
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Using \ fn{t) - fn{t)\ < 2-(^-2) iV3 and the inequality (12.2) with \t\ < VN < L~^^'^ , we 
have 

/ \Ut) - 9ait)? dt < 2/ \g(t)- fn{t)\^dt + 2 [ Unit) - gait)\^ dt 

J\t\<VN J\t\<y/N J\t\<VN 

< ^Ll + 2-(2^-5) iV7/2 (12.4) 

with some absolute constant A. 

The middle integral on the righ-hand side of (12.3) has been already estimated in (11.5). 
In addition, using t^g{t) < we have 

|5a(«)|'=5(i)'(l + a'^) < {l + a^)9{t)<2g{t), 
where we applied Lemma 12.2 together with the assumption L4 < 1 (so that \a\ < 1). Hence, 
f |5a(t)|'dt<2 / e-*'/2dt< 2e-^/^ 

J\t\>^ J\t\>VN 

One may combine this bound with (11.5) and (12.4), and then (12.3) gives 

||7n-<7a|li < ^Ll + 2-(2^-^)Ar7/2 + cViVe-^^ + 4e-^/2 

with C = e^^^'^ and c = cge"^^^ as in (11.5), where cq is an absolute positive constant. Since 
one may always choose cq < |, the above inequality may be simplified as 

\\Jn-9a\\2<AL4 + CN^" e-^^ 

with some absolute constant A and D-dependent constants C = Cqc^^ and c = coe"-^^^. 

By a similar analysis based on the application of Corollary 8.4 and Proposition 10.2 with 
s = 3 (cf. (10.10)), we also have an analogous estimate 

||^"-<'||2<AL4 + C7ArV4e-ciV. 

Hence, (12.1) together with Lemma 12.2 yields 

Dn < AU + CNy^e-"^, (12.5) 

where A is absolute, and C = Cqb^^ and c = coe"-*^^^, as before. The obtained estimate holds 
true, as long as 6 < AT < and \/iV < L^^^^ with L4 < 1. 

Proof of Theorem 1.2. The last condition, \/]V < is satisfied for N = [^L^ ^^^]. 

Then, by the elementary bound a < l]/^, we also have N < The condition > 6 

restricts us to smaller values of L4. If, for example, L4 < 4~^, we have \ > 8 and thus 

> 8. 

Thus, if L4 < 4-^, then (12.5) holds true. But, since N > \ L1^^^, the last term in (12.5) 
is dominated by any power of L4. In particular, using > cix^ {x > 0), we get 

e-^^ < ^ < i!^ L4 = A e'""" ^4. 

Cl& C\& CiCq 

Hence, (12.5) yields 

Dn < CL4: (12.6) 
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with C = Cie^^ e^'^^ = Ci e^^-^, where Ci is an absolute constant. 

Finally, for L4 > 4""®, one may use the relation D„ < D (according to the entropy power 
inequaity), which shows that (12.6) holds with C = A^D. 

Thus, Theorem 1.2 is proved. 

Now, again assume, as above, that L4 < and take N = [^L^^^^]. If D < log^, 
then cN > c^L^l^ ■ \ L~^'^ = d^L'^'^"^ and C = Cqc^^ < CqL^^^^^. Hence, 

CN'/' e-^^ < CoL-'/'^ . L-'/'^ exp { - c'^L-'/'^} < C'^U 

with some absolute constant C'q. As a result, (12.5) yields D„ < (^ + Cq) L4. If L4 > 4~^, 
(12.6) holds with C = 4®, and we arrive at another variant of Theorem 1.2. 

Theorem 12.3. Assume that independent random variables have mean zero and finite 
fourth absolute moments. If they satisfy D{Xk) < c log (1 < A; < n), then 

D{Sn) < CL4, 

where C and c are certain positive absolute constants. {One may take c = -^). 
Let us illustrate this result in the scheme of weighted sums 

Sn = 0,1^1 + • • • + ttnXn 

of independent identically distributed random variables X^, such that EXi = 0, FiXf = 1, 
and with coefficients such that af + --- + a'^ = l. In this case L4 = EXf Ylk=i ^t' ^'^ Theorem 
12.3 is applicable, when the last sum is sufficiently small. 

Corollary 12.4. Assume that Xi has density with finite entropy, and let EXf < +00. // 

the coefficients satisfy 

k=l 1 

then 

n 

D{Sn) < CEXf^al, 
k=l 

where C and c are positive absolute constants. ( One may take c = 48) . 

For example, in case of equal coefficients, so that Sn = Ei±::j±2£rL^ the conclusion becomes 

D(Sn) < -EXf, for all n > m, 
n 

which holds true with an absolute constant C and ni = e^^^(^i) EXf. 
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In this Section wc give a few remarks about Theorems 1.1-1.2 for the case, where the 
densities of summands are bounded. 

First, let us note that, if a random variable X has an absolutely continuous distribution 
with a bounded density p{x) < M, where M is a constant, and if the variance cr^ = Var(X) is 
finite (a > 0), then X has finite entropy, and moreover, 



D{X) < log {MaV2m) . (13.1) 

Indeed, if Z is a standard normal random variable, and assuming (without loss of generality) 
that 0" = 1, we have 

/+0O 
p{x) log p{x)dx, 
-oo 

which immediately implies (13.1). 

It is wortwile also noticing that, similarly to D, the functional X — )• Ma is affine invariant, 
where M = ess sup^, p(x) . Therefore, Ma does not depend neither on the mean or the variance 
of X. In addition, one always has Ma > and the equality is achieved only for X which 

V 12 



is uniformly distributed in a finite interval of the real line. (Without proof this lower bound 
is already mentioned in [St].) 

Using (13.1), Theorems 1.1 and 1.2 admit formulations involving maximum of densities. 

In the statement below, let (-^A:)i<A:<n be independent random variables with mean zero and 
variances a^ = EX^ (ak > 0), such that Ylk=i ~ ^- distribution function of 

the sum Sn — Xi + ■ ■ • + Xyi- 

Corollary 13.1. Assume that every X]^ has density hounded by M^. If maxj. Mfecrj; < D, 
then 

lli^n - $||tv < CLs, (13.2) 
where the constant C depends on D, only. Moreover, 

D{Sn) < CU. (13.3) 



Moreover, one may take C = CqW^ with some positive absolute constants Co and c. 
In particular, consider the weighted sums 

Sn = a,iXi + • • • + OnXn 

of independent identically distributed random variables X^, such that EXi = 0, EX^ = 1, and 
with coefficients satisfying a\ + ■ ■ ■ + = 1. If Xi has density, bounded by M, (13.2)-(13.3) 
yield respectively 

n n 

\\Fn-nTY < CmE|Xi|3 ^|afc|3, D{Sn) < CM^Xf 

k=l k=l 
where Cm depends on M, only. (One may take Cm = CqM'^). 
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Moreover, in the i.i.d. case, where 5„ = ; the last bound may also be written 

with an absolute constant C, i.e., 

D(Sn) < -BXf, for ah n > m. 
n 

Here one may take ni = (M\/27re)^^ EXf. 

Acknowledgement. We would like to thank M. Ledoux for pointing us to the relationship 
between Theorem 1.2 and the transportation inequality of E. Rio. 
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